Plagiarism Detection in Obfuscated Documents Using an N-gram Technique

نویسنده

  • Tomáš Kučečka
چکیده

Plagiarism is considered as a major problem these days especially in case of academic institutions. Very often students present someone else’s work as their own and they are given credit for it. Therefore, we have to integrate plagiarism checking into the submission process which includes usage of plagiarism detection software. In this paper we focus on vulnerabilities of this software from the view of different obfuscations. We briefly introduce our statistical approach that should quite well detect obfuscated documents in case larger portion of the document is obfuscated. We show limits of this method and test effectiveness of n-gram technique used for detecting obfuscated documents. By playing with various parameters when dividing text into n-grams, we test success of n-gram method on similarity detection. We try to find the best parameters for the n-gram technique in order to achieve best detection results. We present the carried experiments and derive the conclusions.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

ENCOPLOT: Pairwise Sequence Matching in Linear Time Applied to Plagiarism Detection

In this paper we describe a new general plagiarism detection method, that we used in our winning entry to the 1st International Competition on Plagiarism Detection, the external plagiarism detection task, which assumes the source documents are available. In the first phase of our method, a matrix of kernel values is computed, which gives a similarity value based on n-grams between each source a...

متن کامل

Machine Translation Evaluation Metric for Text Alignment

As plagiarisers become cleverer, plagiarism detection becomes harder. Plagiarisers will find new ways to obfuscate the plagiarized passages so that humans and automatic plagiarism detectors are not able to point them out. So, a plagiarism detection system needs to be robust enough to detect plagiarism, no matter what obfuscation techniques have been applied. Our system attempts to do the same b...

متن کامل

A plagiarism detection procedure in three steps : selection , matches and ” squares ”

We present a detailed description of an algorithm tailored to detect external plagiarism in PAN-09 competition. The algorithm is divided into three steps: a first reduction of the size of the problem by a selection of ten suspicious plagiarists using a n-gram distance on properly recoded texts. A search for matches after T9-like recoding. A “joining algorithm” that merges selected matches and i...

متن کامل

Automatic Plagiarism Detection Using Word-Sentence Based S-gram

Plagiarism is an academic problem that is caught more and more each year. Common tricks that the cheaters normally use is inserting and removing a few extra terms, sentences, or paragraph to the original copy to trick the reader that the plagiarist copy and the original copy are unalike. This paper provides a new way to detect the plagiarism by checking the similarity between sentences, and par...

متن کامل

Intrinsic Plagiarism Detection Using Character n-gram Profiles

The task of intrinsic plagiarism detection deals with cases where no reference corpus is available and it is exclusively based on stylistic changes or inconsistencies within a given document. In this paper a new method is presented that attempts to quantify the style variation within a document using character n-gram profiles and a style change function based on an appropriate dissimilarity mea...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2011